Supplementary Materials for

categories: natural water; reservoirs; canals; wetlands; recreation; conservation; timber; grazing; pasture; cropland; mining; or barren land. We define a point as semi-developed if it has any of the following ICLUS categories: exurban, low; suburban; urban, low; parks; or golf courses. We define a point as developed if it has any of the following ICLUS categories: exurban, high; urban high;

MS analysis was performed on an Orbitrap (QExactive, Thermo Fisher Scientific, Waltham, MA) mass spectrometer equipped with HESI-II probe sources and controlled by Xcalibur 3.0 software. The following probe settings were used for both MS for flow aspiration and ionization: Spray voltage of 3500 V, Sheath gas (N2) pressure of 35 psi, Auxiliary gas pressure (N2) of 10 psi, ion source temperature of 270 °C, S-lens RF level of 50 Hz and Aux gas heater temp. at 440 °C.
For Orbitrap MS, spectra were acquired in positive ion mode over a mass range of 100-1500 m/z. An external calibration with Pierce LTQ Velos ESI positive ion calibration solution (Thermo Fisher Scientific, Waltham, MA) was performed prior to data acquisition with error rate less than 1 ppm. Data acquisition parameters were set as follows: minutes 0-0.5 were sent to waste; minutes 0.1-12 were recorded with data-dependent MS/MS acquisition mode. Full scan at MS1 level was performed with resolution of 35K in profile mode. The 10 most intense ions with 2 m/z isolation window with m/z 0.5 offset per MS1 scan were selected and subjected to normalized collision induced dissociation with 30 eV. MS2 scans were performed at 17.5K resolution with max IT time of 60 ms in profile mode. MS/MS active exclusion parameter was set to 5.0 s.

LC-MS Data Processing
The LC-MS/MS .raw data files were converted to mzXML format and feature detection was performed with the MZmine2 software (3). The software settings were as follows. Mass detection was performed with a signal threshold of 1.0E3 for MS1 and 1.0E2 for MS2. For the chromatogram building, the mass tolerance was set to 10 ppm, the minimum peak time span .01 s, and minimum height 5.0E3. For chromatographic deconvolution, the local minimum search algorithm was used; m/z range for MS2 scan pairing was set at 0.025 Da and RT at 0.1 min. range. The peaks were de-isotoped within 25 ppm m/z and 0.2 min RT tolerances, aligned, gap-filled using the same tolerances and then filtered to retain only peaks that appear in at least 2 samples with minimum 2 peaks in isotope pattern to create the feature table. Peaks present in any of the blanks were removed from the final feature table unless at least one sample contained the peak at abundance 3x or above.

3D Data Visualization
The aligned features were then exported as .csv with the metadata and combined in RStudio (R)  with the feature table to create a master table for further statistical analysis. The master table  was split into individual tables for Time 1 and Time 2 for further mapping. The tables were then normalized using quantile normalization using MetaboAnalyst (4) and exported out. The 3D model was created for the CAD drawing of the test house. A target was placed at each location of the 3D model where the corresponding sampling in the test house was collected. The coordinates for each target in the house 3D model were then added to the normalized tables. For visualization, the 3D model of the house was dragged and dropped into 'ili (2)(https://ili.embl.de/), followed by the feature table with coordinates. The input tables used for mapping are available at: https://github.com/aaksenov1/HOMEChem-3D-mapping-input-files.
For visualization, the "Jet" color scheme was used throughout for molecular mapping and "Viridis" for microbiome. When the visualized data was divergent and centered around zero (log-ratios) a "Blue-Red" color scheme was used. The scale was set either as linear or logarithmic for visualization clarity (the scale is indicated on each figure as appropriate).

Molecular Networking
A molecular network was created with classical (5) and the Feature-Based Molecular Networking (FBMN) workflow (6) on GNPS (https://gnps.ucsd.edu) (7). The mass spectrometry data were first processed with MZmine2 (3) and the results were exported to GNPS for FBMN analysis. The data were filtered by removing all MS/MS fragment ions within +/-17 Da of the precursor m/z. MS/MS spectra were window filtered by choosing only the top 6 fragment ions in the +/-50 Da window throughout the spectrum. The precursor ion mass tolerance was set to 0.02 Da and the MS/MS fragment ion tolerance to 0.02 Da. A molecular network was then created where edges were filtered to have a cosine score above 0.7 and more than 6 matched peaks. Further, edges between two nodes were kept in the network if and only if each of the nodes appeared in each other's respective top 10 most similar nodes. Finally, the maximum size of a molecular family was set to 100, and the lowest scoring edges were removed from molecular families until the molecular family size was below this threshold. The spectra in the network were then searched against GNPS spectral libraries (7). The library spectra were filtered in the same manner as the input data. All matches kept between network spectra and library spectra were required to have a score above 0.7 and at least 6 matched peaks. The molecular networks were visualized using Cytoscape software (8).

Mass Shift Analysis
For chemical shift analyses unbiased by annotation of chemical shift discretization on the chemical shift space was performed by binning into 3000 bins and removing bins with no occupancy, resulting in 575 discrete count features.
Chemical shift distances between locations were computed using the Bray-Curtis distance in vegan (2.5-6). PCoA projections were computed using ape (5.3) pcoa function. Clear separation was observed for 13 locations because of very low feature coverage in either of the two underlying samples from that location (Fig. S5). These were removed as outliers. Distances to the centroid were calculated using the betadisper function in vegan FoodOmics Analysis: determination of the food sources Reference Data-Driven Analysis using Global FoodOmics data A description of the methods, code and tutorial for generation of Fig. S7a can be found at https://ccms-ucsd.github.io/GNPSDocumentation/tutorials/rdd/ and is linked out to github and the MassIVE repository, as shown in Gauglitz et al. (9).

Microbiome Sample Prep and Sequencing
Both swabs of each microbiome sampling kit were extracted following the standardized Earth Microbiome Project (EMP) protocols (http://www.earthmicrobiome.org/protocols-andstandards/16s) (16). Briefly, DNA was extracted using the MagAttract PowerSoil® DNA Kit (QIAGEN) on a KingFisher Flex (ThermoFisher). The V4 region of the 16S rRNA gene was targeted for PCR amplification using the 515f-806r primers with Golay error-correcting barcodes. The barcoded 16S amplicons were pooled in equal concentrations and the pool was purified with a QIAquick PCR purification kit (QIAGEN). The purified pool was sequenced with a MiSeq V2 300 cycle kit (Illumina) with the appropriate sequencing primers.

Microbiome Data Analysis
Sequence data were demultiplexed, quality filtered, and trimmed to 150bp using QIITA (17). Trimmed sequences were error-filtered using Deblur (18) resulting in a sOTU (18) feature table. Taxonomy was assigned on representative sOTU sequences using a pre-fitted GreenGenes classifier in QIIME2 (19). Upon analysis of rarefaction curves, a 5000 sequencing depth rarefaction was applied, resulting in the retention of 80% of the samples and ~25% of features with relatively even distribution of sample retention between time point T1 (n=232) and time point T2 (n= 221). Due to the compositional nature of sequencing data, feature count comparisons between time points were done as log-ratios with a reference frame (20) in the denominator. The reference frame feature ('k__Bacteria;p__Cyanobacteria;c__Chloroplast;o__Streptophyta;f__;g__') was chosen because it was observed across most samples.

Tree Visualization
A phylogenetic tree visualization (21) was built on assigned taxonomy collapsed at the genera level. Changes in normalized feature counts (log-ratios on genera) between time point T1 and T2 were averaged across all sampled surfaces. Features that were exclusive to either time point (only 3.8% of all observed feature counts) were excluded from quantitative tree visualization.

Media
The media used to reactivate the lyophilized strains was prepared as recommended by the provider, available on their website: https://www.dsmz.de/collection/catalogue/microorganisms/culture-technology/list-of-media-formicroorganisms . Media 1 (nutrient broth) was used in this study for initial cultures of the three microorganisms.

Spent Coffee Ground (SCG)
Commercial ground coffee (Peet's coffee dark roast Major Dickanson's blend) was used for culturing the bacterial strains. The spent coffee ground (SCG) was prepared by brewing 15g of ground coffee into 250 mL of distilled water, sterilized (15 min at 121 o C). The brewed and sterile coffee was filtered using a Corning Filter System (ref 430756). This spent coffee ground was added into 12-well plates, 500 mg per well, resulting in three biological replicates per bacterial strains and an experimental control (SCG with culture media instead of bacterial inoculum).

Culture Conditions
The microorganisms were initially grown in 50 mL Erlenmeyer flasks containing 25 mL of medium 1 (nutrient broth) in a rotary shaker (MaxQ 4450, Thermo Scientific) at 200 rpm with controlled temperature of 30 o C for 48h. From each microorganism, a 500 uL microbial inoculum from a 48h culture were transferred into 12-well plates containing SCG or media 1 agar (nutrient agar) and incubated at 30 o C until required. Plates were used for time-course monitoring, resulting in samples for LC-MS/MS corresponding to time 0, 2, 4, 6 and 8 days.

Extraction of Metabolites
Following a time-course experiment (time 0, 2, 4, 6 and 8 days), plates containing microbial cultures were submitted to three freeze-thaw cycles of 10 minutes each. After that, an aliquot of 30-50 mg from SCG and agar were transferred to a 96-well plate. These samples were extracted with methanol, followed by sonication for 15 min (Branson 5510, Marshall Scientific, Hampton, NH, USA), centrifugation for 15 min at 2000 rpm (865 x g) using a Sorvall Legend RT centrifuge (Marshall Scientific, Hampton, NH, USA). The obtained supernatant was transferred to a clean 96-well plate and dried out in a Centrifugal Vacuum Concentrator, Centrivap (Labconco, Kansas City, MO, USA). Samples were resuspended with 200uL 80% methanol:water containing internal standard (1uM sulfamethazine) for LC-MS/MS acquisition.

Data and Code Availability
All data generated in this study are publicly available. The raw data are available on MassIVE depository (massive.ucsd.edu) under the following dataset accession numbers MSV000083320 and MSV000087141. The annotation and molecular networking have been conducted using GNPS. The links to analyses are provided below.  Figure S1. a) Photographic evidence of the kitchen in the test house prior to human occupation. b) Adjacent surface swabbing scheme for paired metabolomics and microbiome analysis. c) An example of detection of piperine, a compound from black pepper, around the kitchen surfaces in particular near the stove area and the front face of the dishwasher at T1 evidences previous human activity. d) Digital rendering of kitchen and neighboring work and dining stations. Circular, translucent targets record the locations of sampled surfaces. Kitchen/food preparation area is an epicenter of activity in human habitats across all cultures. Consequently, 104 spots were sampled around the kitchen, providing the highest mapping resolution in the entire house. suggests it is likely excreted through urine. c) γ-Glutamyl-S-allylcysteine, a metabolite found in foods, such as garlic, is mainly found in the kitchen. d) Chenodeoxycholic acid, a bile acid produced in the liver modified by gut microbiota, is excreted in feces. Its wide distribution across the house at T2 is indicative of fecal matter spread by the inhabitants. e) Caffeine from coffee, tea, and soda beverages. At T1, it is localized to the kitchen, at T2, it also appears in the bathroom, likely due to excretion in urine. Corresponding spectral match is shown on Figure S5. f) An example of GNPS spectral match for caffeine (https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=ba54bdcb526344bdb5d25fa4f77f4a15&view =view_all_annotations_DB#%7B%22main.Compound_Name_input%22%3A%22caffeine%22% 7D).  Visualization of molecular composition of samples. Each dot corresponds to a detected molecule classified using Qemistree (14) to the class level; the samples are sorted from most diverse to least diverse left-to-right, and the chemical classes from the most prevalent to least prevalent top-to-bottom. In a), the class-level annotations of each molecule is shown. Lipids, organic acids and derivatives, organic nitrogen compounds and benzenoids are among the most common chemical classes and are found across most of the samples. In b), the house location is shown, the samples from the kitchen are consistently more complex across the house and contain most of the molecules found in other locations. In c), the samples at the time point T2 appear to be generally more chemically rich. d) and e): Nestedness Over Decreasing Fill (NODF) statistic measured on the observed data (NODF_OBSERVED) and compared to a null model (NODF_NULL_MEAN, 1000 randomizations, fixed samples counts). d) Significant nestedness is observed both within each location of the entire home and e) between time points T1 and T2. For each pair it is assessed whether the less rich sample of the pair is a nested version of the richest one, and hence it could be in both directions. However, as the samples at T2 are generally richer than T1, it is likely that T1 chemistry is nested in T2. This suggests that the molecules accumulate and the chemical makeup of the house expands from T1 to T2. Figure S7. Molecular overlap of the house chemistry with molecules found in food. a) Flowchart of detected molecules associated with specific food sources. Some of the molecules could be contributed by humans themselves (especially those that overlap with animal sources). The plot informs on both the occurrence of foods, the likelihood of a food to leave a molecular trace, as well as their ease of detection in MS. Coffee left the most pronounced trace of all foods. A relatively large amount of overlap with apple cider may result from use of natural cleaning products for the scheduled cleaning activities. 3D maps on the right show distribution of the molecules associated with three herbs (oregano, rosemary, basil) used in food preparation. As expected, they could be found in the kitchen and table surfaces. b) Qemistree (29) plot representing the tree of distributions of molecular families across the house and the changes from Time 1 to Time 2. A portion of the tree is outlined to highlight an example of carboxylic acids/derivatives, one of the most ubiquitous molecular families found in the house. As some of these molecules may originate from cooking oils, they are found predominantly in the kitchen. However, the changes from Time 1 to Time 2 are dependent on the chemical family grouping, rather than uniform increase, which is indicative of multiple sources for these molecules and/or their transformations.  Figure S9. Changes in microbiome in the course of the HOMEChem campaign. a) Normalized change in read counts from T1 to T2 for microbial taxons collapsed at genera level. A depletion of environment-associated microbial species and their replacement with humanassociated ones can be illustrated by the Bacteroidetes phylum in a phylogenetic tree of microbial shifts; classes Cytophagia, Sphingobacteriia, and Flavobacteriia, typically regarded as environment-associated bacterial clades (30), all experienced a decrease in feature counts between T1 and T2. In contrast, the class Bacteroida, which is known to be a host-associated microbial clade commonly seen in the human gut, saw an increase. b) FEAST analysis (28) indicates depletion of microbes associated with environment (water, soil) and increase in human-associated (skin, feces, oral) as well as human food-associated microbial species. Figure S10. a) Molecular network with mmvec conditional probabilities values for the Hymenobacter represented by the node color (red -metabolite is positively associated with the microbe, yellow -not associated, blue -metabolite is negatively associated with the microbe; the greater node size is related to higher absolute value of the conditional probability coefficient of mmvec, i.e. numerical value that describes the degree of microbe-metabolite association). Almost none of the most associated metabolites can be annotated. b) Example of one of network clusters highlighted on panel (a) that contains multiple microbially-related metabolites for the Hymenobacter and Sphingomonas genus; mostly the same metabolites appear associated with both. The same cluster appears as important for multiple bacterial species in general. Such consistency suggests that these metabolites are core constituents of bacteria. In silico structure prediction (31) indicates that these compounds may be related to phospholipids, metabolites comprising bacterial cell walls (32). MASST (33) has revealed that compounds in this cluster and their analogues are widely distributed and can be observed in a variety of samples that contain bacteria, from microbial cultures to marine and soil samples. c) The network cluster shown in (b) is colored according to compounds' abundances across T1 and T2. The abundances across time points is consistent across the nodes associated with the microbes, suggesting their related function and/or origin. Distribution of the metabolites in such clusters across two time points informs on the shifts in the microbially-associated chemistries. d) An example of a network cluster with compounds that are related to one (Stenotrophomonas) but not another (Corynebacterium) microbe. This scenario has not been encountered often in the present analysis.